Introduction

Introduction




Welcome to the tutorial for the AI Starter Kit on advanced visualisation. We will introduce you to a number of more advanced visualisation techniques. You will learn how these can be used creatively to uncover more elaborate insights from your data than would be possible with the more general out-of-the-box visualisations and plots.

Data visualisation is an important activity in several phases of a data science project. It allows you to understand several characteristics of a dataset, to discover interesting insights, to validate analysis results, to communicate these results to non-experts via intuitive dashboards, and so on. Popular visualisations often used by data scientists are boxplots, distribution plots, as well as bar and line charts. These are basic out-of-the-box visualisations that are generally applicable, but that are often hard to interpret for non-experts. And more importantly, they do not always reveal interesting insights. More advanced visualisations exploit the human eye's extraordinary visual pattern recognition abilities. A clever visualisation of data can already reveal interesting patterns and insights, even before any complex algorithm is applied. On top of that, they can help in formulating hypotheses to be validated further in the data analytics process or to identify features that can be useful for data-driven modelling.

Before we dive deeper in these more advanced visualisation techniques, let us discuss some examples. many time series show some underlying periodicity or seasonality. Depending on the length and granularity of these time series, these seasonal effects are often only hardly visible. In this Starter Kit, we will introduce to you to a number of visualisations that make it very easy to identify for example monthly patterns. Another field where advanced visualisations can help is anomaly detection. Such anomalies appear for example when a machine is failing, and are especially hard to detect when it comes to multi-dimensional data originating from multiple sensors.

In this Starter Kit, we will use advanced visualisations such as timeline plots, heatmaps, calendar maps, area plots and scatter plots to visually explore a dataset and gradually build up more knowledge about it. We use a publicly-available dataset that consists of bike counter data, which contains hourly information on the number of bikes that cross at six different spots in Seattle. By visualising this data, we will be able to: explain certain characteristics of the data, such as some nodes having more crossings than others, identify global trends, such as an increase in traffic over the years, and seasonal trends, such as fluctuating popularity within a year, recognize structural patterns, such as distinct weekday and weekend traffic patterns, and finally detect outliers, such as weekend traffic patterns that occur on weekdays.

In the next video, we will explain the data in more detail before we continue in the following videos with a number of more advanced visualisation techniques to extract insights from this data.

Authors: EluciDATA Lab

Permanent URL